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Abstract 

We find the minimax rate of convergence in Hausdorff distance for estimating a manifold 
M of dimension d embedded in M. D given a noisy sample from the manifold. Under certain 
conditions, we show that the optimal rate of convergence is ?i~ 2 /( 2 + d ). Thus, the minimax 
rate depends only on the dimension of the manifold, not on the dimension of the space in 
which M is embedded. 
Keywords: Manifold learning, Minimax estimation. 

1. Introduction 

We consider the problem of estimating a manifold M given noisy observations near the 
manifold. The observed data are a random sample Y\, . . . , Y n where Y\ £ R-°. The model 
for the data is 

Yi = £ + Z % (1) 

where £i , . . . , £ n are unobserved variables drawn from a distribution supported on a manifold 
M with dimension d < D. The noise variables Z\, . . . , Z n are drawn from a distribution 
F. Our main assumption is that M is a compact, d-dimensional, smooth Riemannian 
submanifold in M. D ; the precise conditions on M are given in Section 2.1 
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A manifold M and a distribution for (£, Z) induce a distribution Q = Qm for Y. In 



Section 2.2 we define a class of such distributions 

q = {q 



!M ■ 



M EM 



(2) 



where Ai is a set of manifolds. Given two sets yl and B, the Hausdorff distance between A 
and -B is 

H(A,B) = inf|e: A C B ® e and Sc^0e| (3) 

where 

A®e=\jB D (x,e) (4) 

and Bd(x,€) is an open ball in R centered at x with radius e. We are interested in the 

minimax risk 

R n {Q) = inf sup E Q [H(M,M)} (5) 

M QGQ 

where the infimum is over all estimators M. By an estimator M we mean a measurable 
function of Yi, . . . , Y n taking values in the set of all manifolds. Our first main result is the 
following minimax lower bound which is proved in Section [3j 

Theorem 1 Under the conditions given in Section 2, there is a constant C\ > such that, 
for all large n, 



inf sup E Q H{M,M) 
m QeS L 

where the infimum is over all estimators M. 



>Ci - 



1 \ 2+d 



II 



(6) 



Thus, no method of estimating M can have an expected Hausdorff distance smaller than 
the stated bound. Note that the rate depends on d but not on D even though the support of 
the distribution Q for Y has dimension D. Our second result is the following upper bound 
which is proved in Section |4j 

Theorem 2 Under the conditions given in Section 2, there exists an estimator M such 
that, for all large n, 

2 

logn \ 2 + d 



supEq H(M,M) 



<C 2 



ii 



(7) 



for some C2 > 0. 



Thus the rate is tight, up to logarithmic factors. The estimator in Theorem [2] is of 
theoretical interest because it establishes that the lower bound is tight. But, the estimator 
constructed in the proof of that theorem is not practical and so in Section [5j we construct 
a very simple estimator M such that 



supE Q H(M,M) 
QeQ L 



< 



Clogra 



7? 



l/D 



(8) 



Minimax Manifold Estimation 



This is slower than the minimax rate, but the estimator is computationally very simple and 
requires no knowledge of d or the smoothness of M. 



Related Work. There is a vast literature on manifold estimation. Much of the litera- 
ture deals with using manifolds for the purpose of dimension reduction. See, for example, 



Baraniuk and Wakin (2007) and references therein. We are interested instead in actually 



estimating the manifold itself. There is a large literature on this problem in the field of 



computational geometry; see, for example, Dey (2006), Dey and Goswami (2004), Chazal 



and Lieutier (2008) Cheng and Dey (2005) and iBoissonnat and Ghosh (2010). However, 



very few papers allow for noise in the statistical sense, by which we mean observations 
drawn randomly from a distribution. In the literature on computational geometry, obser- 
vations are called noisy if they depart from the underlying manifold in a very specific way: 
the observations have to be close to the manifold but not too close to each other. This 
notion of noise is quite different from random sampling from a distribution. An exception 
is Niyogi et al. (2008) who constructed the following estimator. Let I = {i : p(Yi) > A} 
where p is a density estimator. They define M = \J ieI Bu(Yi, e) and they show that if 
A and e are chosen properly, then M is homologous to M. (This means that M and M 
share certain topological properties.) However, the result does not guarantee closeness in 
Hausdorff distance. Note that (J" =1 i?D(l^,e) is precisely the Devroye-Wise estimator for 



the support of a distribution (Devroye and Wise (1980)). 



Notation. Given a set S, we denote its boundary by dS. We let Bjj(x,r) denote a 
D- dimensional open ball centered at x with radius r. If A is a set and x is a point then we 
write d(x,A) = inf^g^ \\x — y\\ where || • || is the Euclidean norm. Let 



AoB = (ADB c ) \J(A c f]B) 



(9) 



denote symmetric set difference between sets A and B. 

The uniform measure on a manifold M is denoted by [xm- Lebesgue measure on IR fc is 
denoted by v^. In case k = D, we sometimes write V instead of U£>; in other words V(A) is 
simply the volume of A. Any integral of the form j f is understood to be the integral with 
respect to Lebesgue measure on WL . If P and Q are two probability measures on M with 
densities p and q then the Bellinger distance between P and Q is 



h(P,Q) = h(p, 



(Vp-Vq)' 



'2 ( 1 - J y/pq} 



where the integrals are with respect to vd. Recall that 

h(p,q) < h(p,q) < \fh{p,q) 



(10) 



(11) 



where £i(p,q) = f \p — q\. Let p(x) A q(x) = min{p(x) , q(x)} . The affinity between P and 
Q is 



IP A> 



p/\q = l- - j \p- q\. 



(12) 
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Let P n denote the ra-fold product measure based on n independent observations from P. 



In the appendix Section 7.1 we show that 



|P n AQ n || > - ( I 



\p -q\ 



2n 



(13) 



We write X n = Op(a n ) to mean that, for every e > there exists C > such that 
P(| \X n \ \/a n > C) < e for all large n. Throughout, we use symbols like C, Co, C\, c, Co, c\ . . . 
to denote generic positive contants whose value may be different in different expressions. 



2. Model Assumptions 

2.1 Manifold Conditions 

We shall be concerned with d-dimensional compact Riemannian submanifolds without 
boundary embedded in M. D with d < D. (Informally, this means that M looks like M rf 
in a small neighborhood around any point in M.) We assume that M is contained in some 
compact set /C C M. D . 

At each u £ M let T U M denote the tangent space to M and let T^M be the normal 
space. We can regard T U M as a d-dimensional hyperplane in Mr and we can regard T^M 
as the D — d dimensional hyperplane perpendicular to T U M. Define the fiber of size a at u 
to be L a (u) = L a {u,M) = T^M f]B D (u,a). 

Let A(M) be the largest r such that each point in M © r has a unique projection onto 
M. The quantity A(M) will be small if either M highly curved or if M is close to being 
self-intersecting. Let Ai = Ai{n) denote all <i-dimensional manifolds embedded in KL such 
that A(M) > k. Throughout this paper, k is a fixed positive constant. The quantity A(M) 



has been rediscovered many times. It is called the condition number in Niyogi et al. (2006), 



the thickness in Gonzalez and Maddocks ( 1999[ ) and the reach in Federer (1959). 

An equivalent definition of A(M) is the following: A(M) is the largest number r such 
that the fibers L r (u) never intersect. See Figure [ll Note that if M is a sphere then A(M) is 
just the radius of the sphere and if M is a linear space then A(M) = oo. Also, if a < A(M) 
then M © a is the disjoint union of its fibers: 



Mffiu 



[JLA 



(14) 



Define tube(M, a) = \JueM L a(u). Thus, if a < A(M) then M © a = tube(M, a). 
Let p,q G M. The angle between two tangent spaces T p and T q is defined to be 



angle(T p ,T g ) = cos ( min max|(u — p,v — q)\ 



\ueTp veT q 



(15) 



where (u,v) is the usual inner product in M. D . Let d]\.j(p,q) denote the geodesic distance 
between p, q £ M. 



We now summarize some useful results from Niyogi et al. (2006) 



Lemma 3 Let M C 1C be a manifold and suppose that A(M) = k > 0. Let p,q £ M . 



Minimax Manifold Estimation 





Figure 1: The condition number A(M) of a manifold is the largest number k such that the 
normals to the manifold do not cross as long as they are not extended beyond 
K. The plot on the left shows a one-dimensional manifold (a curve) and some 
normals of length r < k. The plot on the right shows the same manifold and 
some normals of length r > k. 



1. Let 7 be a geodesic connecting p and q with unit speed parameterization. Then the 
curvature 0/7 is bounded above by 1/k. 



2. cos(angle(Tp,T g )) > l-d M (p,q)/K. Thus, angle(T p ,T g ) < y/2d M (p, q)/K+o(y/d M (p, ?)/«) 

3. If a = \\p — q\\ < k/2 then dM(p, q) < « — «yi ~~ (2q)/k = a + o(a). 
^. 7/o= ||p-g|| < «/2 i/ien a > d M (p,q) - (d M (p,q)) 2 /(2k). 

5. 7/ ||<7 — p|| > e anrf u G Bd(q, e) H T^M n Brj{p, k) then \\v — p\\ < e 2 / 'k. 



\N 



6. Fix any 5 > 0. There exists points x±, . . . ,xn £ M suc/t f/iai M C Ui=i Bd{xj,5) 
and such that N < (c/5) d . 



For further information about manifolds, see Lee (2002) 



2.2 Distributional Assumptions 

The distribution of Y is induced by the distribution of £ and Z. We will assume that £ 
is drawn uniformly on the manifold. Then we assume that Z is drawn uniformly on the 
normal to M. More precisely, given £, we draw Z uniformly on 7 CT (£). In other words, the 
noise is perpendicular to the manifold. The result is that, if a < K, then the distribution 
Q = Qm of Y has support equal to M © a. 

The distributional assumption on £ is not critical. Any smooth density bounded away 
from on the manifold will lead to similar results. However, the assumption on the noise 
Z is critical. We have chosen the simplest noise distribution here. (Perpendicular noise 



is also assumed in Niyogi et al. (2008).) In current work, we are deriving the rates for 



more complicated noise distributions. The rates are quite different and the proofs are more 
complex. Those results will be reported elsewhere. 
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The set of distributions we consider is as follows. Let k and a be fixed positive numbers 
such that < a < k. Let 

Q=Q{K,a) = [Q M : M £ M(k)}. (16) 

For any M £ Ai(n) consider the corresponding distribution Qm, supported on 5m = 
M®a. Let qu be the density of Qm with respect to Lebesgue measure. We now show that 
qM is bounded above and below by a uniform density. 

Recall that the essential supremum and essential infimum of qM are defined by 

ess sup qm = inf< o£R: ^d({u ■ Qm{v) > a} (~) A) = 0\ 
y eA !• J 

and 

essinfqM = supl a G R : vdHv '■ QAiiy) < a} DA) = 0}. 

y£A I J 

Also recall that, by the Lebesgue density theorem, qM{y) = lim 6 _>o Qm(Bd(u, e ))/V(BD(y, e )) 
for almost all y. Let Um be the uniform distribution on M © a and let um = 1/V(M © o") 
be the density of U M . Note that, for A C M © <r, lT m (^) = V(-A)/V(M © cr). 

Lemma 4 There exist constants < C* < C* < oo, depending only on n and d, such that 

n s- ■ f ■ f Qm(v) ^ 1m{v) s n * ( , n , 

G* < mi ess mi -— < sup ess sup -— < C . (17J 

MeM y£S M U M {y) M€M ydS M U M {y) 

Proof Choose any M G A4(n). Let x by any point in the interior of Sm- Let B = B^(x, e) 
where e > is small enough so that B C Sm = M © a. Let y be the projection of x onto 
M. We want to upper and lower bound Q(B)/V(B). Then we will take the limit as e — > 0. 
Consider the two spheres of radius k tangent to M at y in the direction of the line between 
x and y. (See Figure [2]) Note that Q(B) is maximized by taking M to be equal to the 
upper sphere and Q(B) is minimized by taking M to be equal to the lower sphere. Let us 
consider first the case where M is equal to the upper sphere. Let 



U = {«£ M: L ff (u)nB/0l 



be the projection of B onto M. By simple geometry, U = M n Bjy(y, re) where 



a\- 1 ( a 

1 + - <r < 1 + - 

KJ V K 



Let Vol denote d-dimensional volume on M. Then Vo\(Bjj(y, re) n M) < c\r d e d uj^ where ujd 
is the volume of a unit (i-ball and c\ depends only on k and d. To see this, note that because 
M is a manifold and A(ilf) > k, it follows that near y, M may be locally parameterized as 
a smooth function / = (f%, . . . , fn-d) over B n TyJW. The surface area of the graph of / 



over B n T^M is bounded by J B ( vre \ nT M y^ + ll^/ill i which is bounded by a constant 
c\ uniformly over M. Hence, \/o\(B[)(y,re) n M) < c\ Vol (-Bd (jj, re) n r y M) = cir d e d uid- 
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Let Am be the uniform distribution on M and let T u denote the uniform measure on 
L a (u). Note that, for u £ U, L a (u) D B is a (D — fi)-bail whose radius is at most e. Hence, 

e D ~ d LJn j /f\D~d 

r u { Lff {u)nB)< e " D - d ' 



a ± 



^D-d 



a 



Thus, 



Q M {B) = I T u (BnL a ( U ))dA M (u)= I T u (BnL a ( U ))dA M (u) 
Jm Ju 



< 



< 



e\D-d 



A(U) 



e\D-d\/o\(B D (y,r)nM) 
a J 



( \ D - d e d r d u!d 



< 



Vol(M) 
e\D-d e d (l + a/n) d LJ d 



,ov Vol(M) - W Vol(M) 

Now, U M (B) = V{B)/V{M@cj) = e D uo D /(a D - d \/o\(M)). Hence, 

Taking limits as e — > we have that qhiiv) < C*um{v) for almost all y. 

The proof of the lower bound is similar to the upper bound except for the following 
changes: let Uq denote all u € U such that the radius of B n L a (u) is at least e/2. Then 
A([/o) > A-(U)(l — 0(e)) and the projection of Uq onto M is again of the form B£>(y,re)f\M. 
By Lemma 5.3 of jNiyogi et al. (2006), 



\/o\(B D (y,r)nM)> 1 



and the latter is larger than 2^/VeVi for all small e. Also, T u (L a (u)nB) > (e/(2a)) D ' d 
for all u £ Uq. ■ 

Of course, an immediate consequence of the above lemma is that, for every M G M(k) 
and every measurable set A, C* Um(A) < Qm(A) < C* Um(A). 



3. Minimax Lower Bound 

In this section we derive a lower bound on the minimax rate of convergence for this problem. 



We will make use of the following result due to LeCam (1973). The following version is 
from Lemma 1 of Yu (|1997 ) . 



Lemma 5 (Le Cam 1973) Let Q be a set of distributions. Let 9(Q) take values in a 
metric space with metric p. Let Qq, Q\ £ Q be any pair of distributions in Q. Let Y±, . . . , Y n 
be drawn iid from some Q G Q and denote the corresponding product measure by Q n . Let 
9(Yi, . . . , Y n ) be any estimator. Then 



supE Q n p(9(Y 1 ,...,Y n ),e(Q)) >p(9(Q ),9(Qi)) ||Q?AQ?||. 

QeQ L J 



(18) 



To get a useful bound from Le Cam's lemma, we need to construct an appropriate pair 
Qq and Q\. This is the topic of the next subsection. 
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Figure 2: Figure for proof of Lemma [4} x is a point in the support M©<7. y is the projection 
of x onto M. The two spheres are tangent to M at y and have radius k. 



3.1 A Geometric Construction 

In this section, we construct a pair of manifolds Mq, Mi £ M(k) and corresponding distri- 
butions Qo,Qi for use in Le Cam's lemma. An informal description is as follows. Roughly 
speaking, Mq and M\ minimize the Hellinger distance /i((Jo> Qi) subject to their Hausdorff 
distance H(Mq, Mi) being equal to a given value 7. 
Let 

Mo = {(«i,. ..,« d ,0,...,0): -1<«j<1, l<i<d} (19) 

be a d-dimensional hyperplane in R . Hence A(Mo) = 00. Place a hypersphere of radius 
K below Mq. Push the sphere upwards into Mq causing a bump of height 7 at the origin. 
This creates a new manifold Mq such that H(Mq,Mq) = 7. However, Mq is not smooth. 
We will roll a sphere of radius k around M to get a smooth manifold Mi as in Figure pi 
The formal details of the construction are in Section 17.21 



Theorem 6 Let 7 be a small positive number. Let Mq and M\ be as defined in Section 
\7.£\ Let Qi be the corresponding distributions on Mi © a for i = 0, 1. Then: 

1. A(Mi) > k, i = 0, 1. 

2. H(M ,M 1 ) = 1 . 

3. /ko-9i| = 0(7 (d+2)/2 )- 

Proof See Section El ■ 
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A 




Figure 3: A sphere of radius k is pushed upwards into the plane Mq (panel A). The resulting 
manifold Mq is not smooth (panel B). A sphere is then rolled around the manifold 
(panel C) to produce a smooth manifold M\ (panel D). 
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3.2 Proof of the Lower Bound 

Now we are in a position to prove the first theorem. Let us first restate the theorem. 

Theorem 1. There is a constant C > such that, for all large n, 

inf sup E Q \h(M, M)] > Cn'^d (20) 

M QeG L J 

where the infimum is over all estimators M . 



Proof of Theorem [T[ Let Mq and M\ be as defined in Section |3.1| Let Qi be the uniform 
distribution on Mi © a, i = 0,1. Let % be the density of Qi with respect to Lebesgue 
measure vd, » = 0, 1. Then, from Theorempl H(Mq,M\) = 7 and J \qo~ qi\ = 0(^ d+2 '' 2 ). 
Le Cam's lemma then gives, for any M, 

sup E Qn [H(M,M)} > HiM^Mr) \\Qfc f\Q^\\> 7(1 - C7 ( d + 2 )/ 2 ) 2 " 
QeS 



where we used equation (13). Setting 7 = n 2 /y d+2 > yields the result. 



4. Upper bound 

To establish the upper bound, we will construct an estimator that achieves the appropriate 
rate. The estimator is intended only for the theoretical purpose of establishing the rate. (A 
simpler but non-optimal method is discussed in Section [5}) Recall that M = M.(k) is the 
set of all (i-dimensional submanifolds M contained in K, such that A(M) > k > 0. Before 
proceeding, we need to discuss sieve maximum likelihood. 



Sieve Maximum Likelihood. Let V be any set of distributions such that each P £ V 
has a density p with respect to Lebesgue measure I'd- Recall that h denotes Hellinger 
distance. A set of pairs of functions B = {(^1, U\), . . . , (£n, un)} is an e-Hellinger bracketing 
for V if, (i) for each p £ V there is a (£, u) £ B such that £(y) < p(y) < u(y) for all y and (ii) 
h(£, u) < e. The logarithm of the size of the smallest e-bracketing is called the bracketing 
entropy and is denoted by %[](e, V, h). 



We will make use of the following result which is Example 4 of Shen and Wong (1995). 



Theorem 7 (Shen and Wong (1995)) Let e n solve the equation Hn(e n ,V,h) = ne^. 
Let (£i,ui), . . . , (£n,un) be an e n bracketing where N = ~Hn(e n ,V,h). Define the set of 
densities 5* = {p^, . . . ,p* N } where p* t = Ut/ J ut- Letp* maximize the likelihood IliLi PtO^i) 
over the set 5"*. Then 

sup P n {{h{p,p") > e n }) < cie- C2n < (21) 

Per 



10 
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The sequence {5 1 *} in Theorem^ is called a sieve and the estimator p* is called a sieve- 
maximum likelihood estimator. The estimator p* need not be in "P. We will actually need 
an estimator that is contained in V . We may construct one as follows. Let p* be the sieve 
mle corresponding to S*. Then p* = p* for some t. Let (£, u) = (£t,u t ) be the corresponding 
bracket. 

Lemma 8 Assume the conditions in Theorem [?| Let p be any density in V such that 
£ <p <u. If e n < 1 then 

sup P n ({h(p,p) > ce n }) < Cl e- C2 " e ". (22) 

Pev 

Proof By the triangle inequality, h(p,p) < h(p,p*) + h{p,p*) = h(p,p*) + h(p,u t / f Ut) 
where p* = Ut/ f Ut f° r some t. From Theorem ul h{p,p*) < e n with high probability. 
Thus we need to show that h(p,ut/ J ut) < Ce n . It suffices to show that, in general, 
h(p, u/ J u) < C h(£, u) whenever £ < p < u. 

Let (£, u) be a bracket and let 5 2 = h 2 (£,u) < 1. Let £ < p < u. We claim that 
h 2 (p,u/ f u) < 4<5 2 . (Taking 5 = e n then proves the result.) Let c 2 = f u. Then 1 < c 2 = 
f u = f p + f( u -p) = i + j(u-p) = l+£i(u,p) < l + 2h(u,£) = 1 + 25. Now, 



,2 



h z P 



u 



J 



) = J(V^/c-Vp) 2 = ^J(Vu--cVp) 2 < J(Vu--cVp) 2 

= y ((VS ~y/p) + (c- 1) VP) 2 < 2 y (VS - VP) 2 + 2(c - l) 2 
< 2<5 2 + 2(^/1 + 25 - l) 2 < 25 2 + 25 2 = 45 2 
where the last inequality used the fact that S < 1. ■ 

In light of the above result, we define modified maximum likelihood sieve estimator p to 
be any p £ V such that £ < p < u. For simplicity, in the rest of the paper, we refer to the 
modified sieve estimator p, simply as the maximum likelihood estimator (mle). 



Outline of proof. 



We are now ready to find an estimator M that converges at the optimal rate (up to loga- 
rithmic terms.) Our strategy for estimating M has the following steps: 

Step 1. We split the data into two halves. 

Step 2. Let Q be the maximum likelihood estimator using the first half of the data. Define 
M to be the corresponding manifold. We call M, the pilot estimator. We show that 

2 

M is a consistent estimator of M that converges at a sub-optimal rate a n = n D ( d + 2 ) . 
To show this we: 



a. Compute the Hellinger bracketing entropy of Q. (Theorem KM Lemmas 10 and 

ii 
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b. Establish the rate of convergence of the mle in Hellinger distance, using the 
bracketing entropy and Theorem [7| 

c. Relate the Hausdorff distance to the Hellinger distance and hence establish the 



rate of convergence a n of the mle in Hausdorff distance. (Lemma 13). 



d. Conclude that the true manifold is contained, with high probability, in M n 



{M E M(k) : H(M,M) < a n } (Lemma 14). Hence, we can now restrict 
attention to M. n . 

Step 3. To improve the pilot estimator, we need to control the relationship between Hellinger 
and Hausdorff distance and thus need to work over small sets on which the manifold 
cannot vary too greatly. Hence, we cover the pilot estimator with long, thin slabs 
Hi, ... , Rn- We do this by first covering M with spheres Hi, . . . ,3jv of radius 5 n = 
0({\ogn/n) 1 i( 2+d i). We define a slab Rj to be the union of fibers of size b = a + a n 
within one of the spheres: Rj = U^.L^x, M). We then show that: 

a. The set of fibers on M cover each M £ M n in a nice way. In particular, if 
M G M n then each fiber from M is nearly normal to M. (Lemma |l5j). 

b. As M cuts through a slab, it stays nearly parallel to M. Roughly speaking, M 
behaves like a smooth, nearly linear function within each slab. (Lemma |16[). 

Step 4. Using the second half of the data, we apply maximum likelihood within each slab. 
This defines estimators Mj, for 1 < j < N. We show that: 



a. The entropy of the set of distributions within a slab is very small. (Lemma 18). 

b. Because the entropy is small, the maximum likelihood estimator within a slab 
converges fairly quickly in Hellinger distance. The rate is e n = (logn/n) 1 '^ 2 " 1 " '. 



(Lemma 19) 



Within a slab, there is a tight relationship between Hellinger distance and Haus- 
dorff distance. Specifically, H(Mi,M 2 ) <ch 2 (Q 1 ,Q 2 ). (Lemma |20|. 



d. Steps (4b) and (4c) imply that H{M^R j ,M j ) = P {e n ) = Op((logn/n) 2 /( d + 2 )) 



Step 5. Finally we define M = lL = i Mj and show that M converges at the optimal rate 
because each Mj does within its own slab. 

The reason for getting a preliminary estimator and then covering the estimator with 
thin slabs is that, within a slab, there is a tight relationship between Hellinger distance and 
Hausdorff distance. This is not true globally but only in thin slabs. Maximum likelihood 
is optimal with respect to Hellinger distance. Within a slab, this allows us to get optimal 
rates in Hausdorff distance. 

Step 1: Data Splitting 



For simplicity assume the sample size is even and denote it by 2n. We split the data into 
two halves which we denote by X = (X±, . . . , X n ) and Y = (Y\, . . . , Y n ). 



Step 2: Pilot Estimator 
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Let q be the maximum likelihood estimator over Q. Let M be the corresponding manifold. 
To study the properties of M requires two steps: computing the bracketing entropy of Q 
and relating H(M, M) to h(q,q). The former allows us to apply TheoremJTjto bound h(q,q), 
and the latter allows us to control the Hausdorff distance. 

Step 2a: Computing the Entropy of Q. To compute the entropy of Q we start by 
constructing a finite net of manfolds to cover A4(k). A finite set of d-manifolds M 7 = 
{Mi, . . . , -Mat} is a 7- net (or a 7-cover) if, for each M G M there exists Mj G M 7 such that 
H(M,Mj) < 7. Let N(j) = N(j,A4,H) be the size of the smallest covering set, called the 
(Hausdorff) covering number of M. 

Theorem 9 The Hausdorff covering number of M satisfies the following: 

N(i) = N{j,M, H) < c x k 2 (k, d, D) exp (k 3 (k, d, D) 7 ~ d / 2 ) = cexp (c'^ d/2 \ (23) 

where K2(k, d, D) = ( d ) and K3(k, d, D) = 2 > 2 {D — d)(c2/n) , for a constant C2 that 

depends only on k and d. 

Proof Recall that the manifolds in M all lie within /C. Consider any hypercube containing 
K. Divide this cube into a grid of J = (2c/k) d sub-cubes {C\, . . . , Cj} of side length k/c, 
where c > 4 is a positive constant chosen to be sufficiently large. Our strategy is to show 
that within each of these cubes, the manifold is the graph of a smooth function. We then 
only need count the number of such smooth functions. 

In thinking about the manifold as (locally) the graph of a smooth function, it helps 
to be able to translate easily between the natural coordinates in /C and the domain-range 
coordinates of the function. To that end, within each subcube Cj for j £ {1, . . . , J}, we 
define K = ( ,) coordinate frames, Fj^ for k € {1, . . . , K}, in which d out of D coordinates 
are labeled as "domain" and the remaining D — d coordinates are labeled as "range." 

Each frame is associated with a relabeling of the coordinates so that the d "domain" 
coordinates are listed first and D — d "range" coordinates last. That is, Fjk is defined 
by a one-to-one correspondence between x E Cj and (u,v) G itj}.(x) where u G M. d and 
v G ~R D ~ d and 7Vjk(%i, • • • , xd) = (^u > • ■ • , Xi d , %ji, ■ ■ ■ > x j D - d ) f° r domain coordinate indices 
i\ < . . . < id and range coordinate indices j% < . . . < jo-d- 

We define domain(l^) = {u G R d : 3v G R D ~ d such that (u,v) G F jk }, and let Q jk 
denote the class of functions defined on domain (Fjfc) whose second derivative (i.e., second 
fundamental form) is bounded above by a constant C(k) that depends only on k. To say 
that a set R C Cj is the graph of a function on a d-dimensional subset of the coordinates 
in Cj is equivalent to saying that for some frame Fjk and some set A C domain (Fjk), 
R = 7Tj k 1 {( u J( u )): u eA}. 

We will prove the theorem by establishing the following claims. 

Claim 1. Let M G M. and Cj be a subcube that intersects M. Then: (i) for at least one 
k G {1, . . . ,K}, the set M n Cj is the graph of a function (i.e., single-valued mapping) 
defined on a set A C domain(Fjfc), of the form (ui, ■ ■ ■ , Ud) *->• vr^ fc ((u,f(u))) for some 
function / on A, and (ii) this function lies in Qjj.. 
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Claim 2. M. is in one-to-one correspondence with a subset of Q = Y\.j=i Ufc=i & 



4jk- 



Claim 3. The L°° covering number of Q satisfies 

(D\ {2c/K)D / 
N{i,g,L°°)<cy[ A exp((D-d)(2c/ K ) 



D^-d/2 



Claim 4- There is a one-to-one correspondence between an 7/2 L°°-cover of Q and an 7 
Hausdorff-cover of Ai . 

Taken together, the claims imply that 

/m (2c/ K ) D 

N(r(,M,H) <ci[ ) exp(( J D-d)(2c/K) D 2 d / 2 7 - d/2 )- 

Taking C2 = 2c proves the theorem. 

Proof of Claim 1. We begin by showing that (i) implies (ii). By part 1 of LemmaKJl each 
MgM has curvature (second fundamental form) bounded above by 1/k. This implies that 
the function identified in (i) has uniformly bounded second derivative and thus lies in the 
corresponding Qj/.. 

We prove (i) by contradiction. Suppose that there is an M £ M such that for every j 
with M n Cj 7^ 0, the set M n Cj is not the graph of a single- valued mapping for any of the 
K coordinate frames. 

Fix j G {1,..., J}. Then in each domain(i ? jfc), there is a point u such that Cj n 
irj k (u x ]R ) intersects M in at least two points, call them a^ and &&. By construction 
||«fc — ^fc|| < y/D — d ■ k/c, and hence by choosing c large enough (making the cubes small), 
part 3 of LemmapUells us that dM^k^ bk) < 2\/D — dn/c. Then we argue as follows: 



1. By parts 2 and 3 of Lemma [3j and the fact that Cj has diameter V Dk/c and 

2\TD 

max cos(angle(T„M, T q M)) > 1 . 

p,q&Cjr\M v ° v y y c 

For large enough c, the maximum angle between tangent vectors can be made smaller 
than 7r/3. 

2. By part 2 of Lemma [3j any point z along a geodesic between au and bk, 



cos(angle(T afc M,T Z M)) > 1 - V ^ ^ 

It follows that there is a point in C,- n M and a tangent vector vj~ at that point such 
that angle(t;fc,6fc - a k ) = 0(l/y/c). 

3. We have for each of K = ( d ) coordinate frames and associated tangent vectors 
vi, . . . , vk that are each nearly orthogonal to at least d of the others. Consequently, 
there are > d + 1 nearly orthogonal tangent vectors of M within Cj . This contradicts 
point 1 and proves the claim. 
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Proof of Claim 2. We construct the correspondence as follows. For each cube Cj, let 
k*; be the smallest k such that M n Cj is the graph of a function (f>jk G Q^ as in Claim 1. 
Map M to ip = ((j>ik* , • • • , <pjk*), and let T C Q be the image of this map. If M / M' G M, 
then the corresponding </? and (f' must be distinct. If not, then M D Cj = M' n Cj for 
all j, contradicting M ^ M'. The correspondence from Ai to T is thus a one-to-one 
correspondence. 



Proof of Claim 3. From the results in Birman and Solomjak ( 1967), the set of functions 



defined on a pre-compact d-dimensional set that take values in a fixed dimension space 
W 11 with uniformly bounded second derivative has L°° covering number bounded above by 
Cie m ( 1 /7) f or some Cl . Part 1 of Lemma K^ shows that each M G M has curvature (second 
fundamental form) bounded above by 1/k, so each Qa^ satisfies Birman and Solomjak's 
conditions. Hence, iV(7, £?.,■£,, L°°) < c\&- D ^ d >^ 1 ' 1 ' . Because all the QjkS are disjoint, 

simple counting arguments show that N(j, G, L°°) = ( ( rf ) N(j, Qjk, L°°) J , where J is the 

number of cubes defined above. The claim follows. (Note that the functions in Claim 1 are 
defined on a subset of domain^-^). But because all such functions have an extension in 
Qjk, a covering of Gjk also covers these functions defined on restricted domains.) 

Proof of Claim 4- First, note that if two functions are less than 7 distant in L°° , their 
graphs are less than 7 distant in Hausdorff distance, and vice versa. This implies that a 
7 L°° -cover of a set of functions corresponds directly to an 7 Hausdorff-cover of the set of 
the functions' graphs. Hence, in the argument that follows, we can work with functions or 
graphs interchangeably. 

For k G {1, . . . , K}, let QZ be a minimal L°° cover of Gjk by 7/2 balls; specifically, we 
1\ is the set of centers of these balls. For each g^ G Q\ 



assume that Q\ is the set of centers of these balls. For each g^ G Q~J k , define fjk(u) 



~K~ k (u,gjk(u)). For every j, choose one such fjk, and define a set M' -= (J (Cj-nrange^^)), 
which is a union of manifolds with boundary that have curvature bounded by 1/k. That 
is, such an M' is piecewise smooth (smooth within each cube) but may fail to satisfy 
A(M') > k globally. Let A be the collection of M' constructed this way. There are 
N{^f/2 } Q } L°°) elements in this collection. 

By construction and Claim 2, for each M G M, there exists an M' G A such that 
H(M, M') < 7/2. In other words, the set of 7/2 Hausdorff balls around the manifolds in 
A covers M but the elements of A are not themselves necessarily in M. Let Bh(A, 7/2) 
denote the set of all d-manifolds M £ M such that H(A, M) < 7/2. Let 

Ao = {A€A:B H (A,j/2)nM^®\. (24) 

For each A £ Ao, choose some A G Bjj(A, r y/2) n M.. By the triangle inequality, the set 
{A : A G .4o} forms an 7 Hausdorff-net for M. This proves the claim. ■ 



We are almost ready to compute the entropy. We will need the following lemma. 

Lemma 10 Let < 7 < k — a. There exists a constant K > (depending only on fC, k and 
a) such that, for any M\, M 2 G M(k), H(M 1 ,M 2 ) < 7 implies that \V{M\ a) - V(M 2 
a)\ < Kj. Also, for any M G M{k), \V(M (a + 7)) - V(M © a)\ < K~/. 
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Proof Let Sj = Mj © cr, j = 1, 2. Then, using ([14 



5 2 CM 1 e(a + 7 )= [J L a+1 {u). (25) 

Hence, uniformly over Ai, 

V(S 2 )< v D - d {L a+1 (u))dnMi < / VD-d(La(u))dn Ml + K<y = V(S X ) + K'y 

J Mi JMx 

since VD-d(B(u,a + j)) < iT)-d(B(u, a))+Kj for some K > not depending on Mi or M2. 
By a symmetric argument, V(5i) < V(S 2 ) + Kj. Hence, |V(Mi cr) - F(M 2 cr)| < Kj. 
The second statement is proved in a similar way. ■ 

Now we construct a Hellinger bracketing. Let 7 = e 2 . Let M 7 = {Mi, . . . , M^v} be a 
7-Hausdorff net of manifolds. Thus, by Theorem ^ iV = N(e 2 ,M,H) < cie C2(1/e)d . Let 
a; denote the volume of a sphere of radius a. Let gj be the density corresponding to Mj. 

Define 

/ 0/2 

I(y£Mje{(T + € 2 )) 



I(y GM j ©((T-e 2 )). 



Lemma 11 ,8 is an e-Hellinger bracketing of Q. Hence, 'Hn(e, Q,h) < C(l/e) . 

Proof Let M G M(k) and let Q = Qm be the corresponding distribution. Let q be 
the density of Q. Q is supported on S 1 = M © a. There exists Mj G M 7 such that 
H(M, Mj) < e 2 . Let y be in S. Then there is a x G M such that ||y — x|| < cr. There is a 
x' G Mj such that ||x — x'\\ < e 2 . Hence, d(y, Mj) < a + e 2 and thus y is in the support of 
Uj. Now, for y G 5, i^-(y) - g(y) = 2e 2 /V(M i © (cr + e 2 )) > 0. Hence, q(y) < Uj(y). By a 
similar argument, ij(y) < g(y)- Thus £> is a bracketing. Now 

n ,„ f fn (, 2Ke 2 \ / 2Ke 2 \ 4Ke 2 





^ ) = (* W + ^e( ff + ^)) 


and 


4M=(*M , (Mje(ff _ f2))/ 


Let£ = 


= {(^l,«l), ...,(^jv,«jv)}- 



Finally, by (11), h(uj,£j) < y/£\(£j,Uj) = Ce. Thus B is a Ce- Hellinger bracketing. 



Step 2b. Hellinger Rate. 
Lemma 12 Let Q be the mle. Then 

sup Q n ({h(Q,Q) >C n~^\\ <exp|-Cn^} 



QeS 



1G 
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Proof We have shown (Lemma 111 that Tin(e,Q,h) < C(l/e) . Solving the equation 



w 



Hn(e n , Q,h) = ne^ from Theorem |7| we get e n = (1/n) 1 '' "+ 2 ). From Lemma 8 1 for all Q 
Q n ({h(Q,Q) >C n-^Y) <cie" C2ne " = exp{-Cra^} 

Step 2c. Relating Hellinger Distance and HausdorfF Distance. 



Lemma 13 Let c = {k - a)^C^/(2T(D/2 + I)). If M X ,M 2 € M(n) and h(Qi,Q 2 ) < c 
then 

2 /r(D/2 + l)^ 1/D l 



H{M 2 ,M 2 )< 



vr V C* 



ho{Q u Q 2 ) 



Proof Let b = H(Mi,M 2 ) and 7 = min{/-c — c, b}. Let S±,S 2 be the supports of Qi 
and Q2- Because H(M\,M 2 ) = b, we can find points x e M\ and y € M2 such that 
||y — x|| = b. Note that T X M\ and T y M 2 . are parallel, otherwise we could move x or y and 
increase ||y — x\\. It follows that the line segment [x, y] is along a common normal vector 
of the two manifolds and we can write y = x ±bu for some u £ L a (u, M). Without loss of 
generality, assume that y = x + bu. Let x' = x + au and y' = y + an. Hence, x' & dSi, 
y' € dS 2 and \\x' — y'|| = b. Note that dSi and dS 2 are themselves smooth D-manifolds 
with A(dSi) >K-a>0. 

We now make the following three claims: 

1. y' e S 2 - Sl 

2. (x',y']cS 2 -S 1 

3. interior B ( x 2 y , | J C ^2 — ^i 

First, note that y' differs from y along a fiber of M 2 by exactly a, therefore [x 1 , y'] C S"2. 
Second, because x' £ dS\, there is a neighborhood of x' in [x',y'] that is not contained in 
Si. Hence, if there is a point in Si n [x',y'] there must be a point z' € dS\ n [x',y'], with 
z' 7^ x'. This implies the existence of two distinct points whose fibers of length less than 
k — a cross, which contradicts the fact that A(<9Si) > k — a. Claims 1 and 2 follows. 

Let B = B [ x p , Z ) . By construction, B is tangent to dSi at x' and tangent to dS 2 at 



2 ' 2 / 

y' , and B contains [a/, y']. The ball has radius 7/2 = (1/2) min{K — a, b} < n — a. Because 
B intersects S 2 — S\, the interior of B cannot intersect either dSi or dS 2 . Claim 3 follows 
by a similar argument as in the proof of Claim 2. (In particular, if there were a point in 
the interior of B that is either in Si or outside S 2 , a line segment from (x 1 + y')/2 to that 
point would have to intersect the corresponding boundary, which cannot happen.) 
Now V(B) = {j/2) d tt d / 2 /T(D/2 + 1). So 

h(Qi,Q 2 ) > h(Qi,Q2)= \qi-q2\> / ki-^l 

= I qi = Qi(Si n S c 2 ) > C*V(S! n S%) = C^/2) d tt d ^/T(D/2 + 1). 
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Hence, 



7 = minJK; — a, b} < 



2 fT(D/2 + l) 

It 



O* 



l/D 



h l/D {Qi,Q 2 ). 



If k — a < 6 this implies that h{Q\,Q2) > c which contradicts the assumption that 
h(Qi, Q2) < c. Therefore, 7 = 6 and the conclusion follows. ■ 

Step 2d. Computing The Hausdorff Rate of the Pilot. 

2 
Lemma 14 Let a n = (^) Di - d + 2 ) . For all large n, 



sup Q n ( {H(M, M) >a n }) < exp \ -Cn^ 



|-Cn2+d 1 . 



(26) 



Proof Follows by combining Lemma 12 and Lemma 13 



We conclude that, with high probability, the true manifold M is contained in the set 
M n = [m e M{k) : H(M,M) < a n \. 



Step 3: Cover With Slabs 



Now we cover the pilot estimator M with (possibly overlapping) slabs. Let 8 n = 
( — rF^ ) • -^ follows from part 6 of Lemma 3 that there exists a collection of points 

F = {x x ,...,xn} C M, such that A^ = (c5 n )- d = (Cn/ log n) d ^ 2+d ^ and such that 
Mc[)f =1 B D ( Xj ,c5). 

Step 3a. The Fibers of M Cover M Nicely. 

Lemma 15 Letb = a + a n . Forx£M,letLi,(x)=T^-MnB£)(x,b)beafiberatxofsize 
b. Let M e M n - Then: 

1. Ifx^M and x G M are such that \\x — x\\ < a n , then angle(T x M, T^M) < 7r/4. 

2. L b (x)DM /0. 

3. Ifxe L b {x) n M, then \\x - x\\ < 2a n . 

4. For any x G M, #{L b (x) C\ M} = 1. 

5. WehaveM c\J~ eII L b (x). 

Proof 1. Let x and x be as given in the statement of the lemma and let 9 = ang\e(T x M, T^M). 
Suppose that 8 > 7r/4. There exists unit vectors u £ T^M and v £ T X M such that 
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Figure 4: Figure for the proof of part 1 of Lemma 15 



ang\e(u,v) = 0. Without loss of generality, we can assume that x = x. (The extension to 
the case x ^ x is straightforward.) 

Consider the plane defined by u and v as in Figure |4| We assume, without loss of 
generality, that (u + v)/2 generates the x-axis in this plane and that v lies above the x-axis 
and u lies below the x axis. Let £ denote the horizontal line, parallel to the x-axis and lying 
2a n units above the horizontal axis. Hence, u and v each make an angle greater than ir/8 
with respect to the x-axis. 

Consider the two circles C\ and C2 tangent to M at x with radius k where C\ lies below 
v and C2 lies above v. Let w be the point at which C\ intersects £. The arclength of C\ from 
x to w is Ca n for some C > 1. Let 7 be the geodesic on M through x with gradient v. The 
projection 7 of 7 into the plane must fall between C\ and €2- Let y = 7(Ca n ) and y be the 
projection of y into the plane. 

Now \\y — x\\ > \\y — x\\ > \\w — x\\ > 2a n > a n . There exists z G M such that 
\\z — y\\ < a n . Hence, \\z — y\\ < a n where z is the projection of z into the plane. Let 
q be the point on the plane with coordinates (a n VC 2 — l,a n ). Thus, \\q — x|| = C a n . 
Note that angle(z — x,u) is larger than the angle between q — x and the x-axis which is 



arctan 



1 



Vc^i 



= a > 0. Hence, 
angle(z 



x, u 



> angle(z — x,u) > a. 



Let 7 be a geodesic on M, parameterized by arclength connecting x and z. Thus 
7(0) = x and 7(T) = z for some T. There exists some < t < T such that j'(t) (xz — x. 
So 

angle( 7 / (t),7 / (0)) = a>0. 

However, \\z— x\\ < (C+l) a n which implies, by part 2 of Lemma|3| that angle(7 / (i),7 / (0)) 
0(Ja^) < a which is a contradiction. 
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2. For any x G M, the closest point x G M must satisfy ||x — x|| < a n . Let y be the 
projection of x onto T^M. Let C/ = T^M n Bd(y,a n ). Let Cyl = [J ueU Bd(u, 3a n ) n 

( TjM J . Cyl is a small hyper-cylinder containing y and x, with the former in the center. 

M cannot intersect the top or bottom faces of the cylinder. Otherwise, we can find a point 
p G M such that angle(TjM,T p M) > arctan(l) = ir/4 contradicting 1. Thus, any path 
through x on M must intersect the sides of Cyl. Hence, L b {x) flM/| 



3. Let x G M Pi L b (x). Suppose that ||x — x|| > 2a n . There exists q & M such that 
||g — x|| < a n . Note that ||g — 3f|| > a n . Now we apply part 5 Lemma [3] with p = x and 
v = x. This implies that \\v — p\\ = \\x — x\\ < a\j k which contradicts the assumption that 
llx — x\\ > 2a n . 



4- Suppose that more than one point of M were in Lb(x). Pick two and call them x\ and 
X2- By 3, \\xi — x\\ < 2a n . It follows that \\xi — X2W < 4a n and thus they are 0(a n ) close 
in geodesic distance by part 3 of Lemma [3j Hence, there is a geodesic on M connecting x\ 
and X2 that is contained strictly within the Ca n ball. Because X2 — x\ lies in Lf,(x) and 
is consequently orthogonal to T%M, there must exist a point on the geodesic whose angle 
with T%M equals ir/2, contradicting part 1. 



5. Because H(M,M) < a n , we have that M C tube(M, a n ). Because a n < k, the fibers 
L{,(x) partition tube(M, a n ). Hence, each x £ M must lie on one (and only one) Lf,(x). ■ 

Step 3b. Construct slabs that cover M nicely. Let 2j = Br>{xj,5 n ) n M. Define the 
slab 

Rj= [J L b (x,M). (27) 

Lemma 16 The collection of slabs R±, . . . , Rn has the following properties. Let M G -M n . 

1. Mc\jf =1 Rj. 

2. M n Rj is function-like over Rj . That is, there exists a function gj : 2j — > M. D ~ d such 
that M n Rj = {gj{x) : x G 2j}. 

3. For each x G 2 j; L b (x) D M ^ 0. 

4- There exists a linear function £j : 1j —> R such that sup^g^ . 1 1^- (ic) — ^j(x)|| < Cb\. 
5. sup MeA1n diam(M C\Rj)< C5 n . 

Thus the slabs cover M and M cuts across Rj is a function-like way. Moreover, M n i?j 
is nearly linear. 
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Proof The first three claims follow immediately from Lemma 15 In particular, gj in 
claim 2 is defined by gj(x) = {M n Lb(x)}. Now we show 4- We can write gj(x) = 
gj(xj) + (x — Xj) T Vg + \{x — Xj) T Hess(x — Xj) where Hess is the Hessian matrix of gj 
evaluated at some point between x and Xj . By part 1 of Lemma [3j the largest eigenvalue 
of Hess is bounded above by 1/k. Since ||x — Xj\\ < cS^, the claim follows. Part 5 follows 
easily. ■ 



Step 4: Local Conditional Likelihood 



Recall that M n = {M G M(k) : H(M, M) < a n }. Let 

Q n = {Q M : MeM n }. (28) 

Consider a slab Rj. For each Q e Q n define Qj = Q(-\Rj) by Qj(A) = Q(A n Rj)/Q{Rj). 
Note that Qj is supported over tube(M, a) n Rj. Let Q n j = {Qj : Q G Q n }. Before we 
proceed we need to establish the following. 

Lemma 17 Let Zj(M) = tube(M, a) n Rj. Then there exists cq > suc/i that 



Proof By Lemma 16, Mn i?j lies in a slab of size a n orthogonal to 1j. Because the angle 
between the two manifolds on this set must be no more than 7r/4 and because a n > S n , the 
manifold M cannot intersect both the "top" and "bottom" surfaces of the slab. Hence, for 
large enough C > 0, J$ = [j xel . B D {x,a/C) C Xj. By construction, V(lj) > V(Jj) > c5%. 

m 

Step 4a. The Entropy of Q n ,j- 

Lemma 18 H[](e, Q n ,j,h) < cilog(c 2 /e). 

Proof We begin by creating a 7 Hausdorff net for Q n ,j- To do this, we will parameterize 
the support of these distributions. Each Q G Q n ,j has support in the collection S n j = 
{(M © a) n Rj : Me M n }. We will construct a 7-Hausdorff net for S n j. 

Let x G M be the center of 1j. Let y\, . . . , y r be a ci7-net of Lb(x), and let 6\ < 62 < 
■ ■ ■ < 9 S < 7r/2 — r/ for a small, fixed r\ > where #j — 9j—\ < C27. Note that r = 0{~f~( D ~ d >) 
and s = 0(1/7). For every pair m and 0j, let My be a M G M. n that crosses through m 
with angle(T^M, T^M) = fy. These manifolds comprise a collection of size 0{{l/^) D ~ d ~ l ) 
which we will denote by Net(7). 

Let M G M. n - Let y be the point where M crosses Lb(x). Let yi be the closest point 
in the net to y and let 9j be the closest angle in the net to angle(TyM, T%M). Because the 



angle between M and My is strictly less than ir/4 (part 1 of Lemma 15 ) and the slab Rj has 
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radius 6 n , it follows that H(M, My) < C\j + 5 n C2l < C7. Hence, Net(7) is a 7-Hausdorff 
net. 

Now consider Net(7) with 7 = e 2 . For each My G Net(7) let q%j be the correspondng 
density and define My and £y by 



F(Mye(c7 + e 2 )) 



uijiv) = Qij(y) + ?7777 «,,_ ; ^ ) 7 (y G M ^ © ( CT + e2 )) 



and 

^(y) = (<*(v) - y(M ^et-e»)) ) 1{y G M ' e (cJ " e2)) ' 

Let B = {(£ij,Uij)}. 

Let M £ M n and let My be the element of the net closest to M. It follows easily that 
Uij > 3m > hj- Thus £> is a bracketing. Now, 



f Uij - lij = 1 + Ce 2 - (1 - Ce 2 ) = 2Ce 2 . 



Hence, /i(uy, ^y ) < J J |uy — lij\ = \J2Ce. Hence, B is an V2C — e-bracketing. So, 

H U (e, Qnj, fc) < (D - d - 1) log(c/e), (29) 

which proves the lemma. ■ 



Step 4b. Hellinger Rate of the Conditional MLE. Let q be the mle over Q n j using 
the Yi's in Rj. Let M be the manifold corresponding to q and let Mj = M D Rj. 



Lemma 19 For all Q, all A > and all large n, 

or ({mo, a»(^=) *}W X - 

Proof Let TVj be the number of observations from the second half of the data that are in 

2 
Rj. Let jUj = E(iVj) and define m n = n 2 + d . First, we claim that Nj > fXj/2 = 0(m n ) for 

all j, except on a set of probability e~ cn . Let ttj = Q(Rj). By Lemma |l7| and Lemma 

Ul ttj > c6^ for some c > 0. Hence, [ij > m n . Note that a 2 = Var(Nj)/n = ttj(1 — tvj) < ttj. 
Let t = fJ-j/2. By Bernstein's inequality, 

F(Nj < H /2) = F(Nj - N < -fXj/2) < expj- ^— A— ^ | < exp {-cn 2 /( 2+rf )} . 

Hence, by the union bound, 

P(JV,- < fij/2 for some j) < — exp |-cn 2/(2+d) | < exp |-c n 2/(2+d) | 
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since there are N = 0(1/S n ) slabs. Thus we can assume that there are at least order m n 



observations in each Rj. 



Since Hn(e, Q n ,j, h) < log(C(l/e)), solving the equation Hn(e, Q n ,j,h) = m n e 2 we get 
e m > \JC log m n /m n = (logn/n) 2 ^ 2( - 2+d ^ = 5 n . From LemmatfU we have, for all Q G Q n j, 



Q n ({h(Q,Q) > 5 n }) = Q n ({h(Q,Q) > e m }) < ae 



c 2 m n e m < n - 



Step 4c. Relating Hausdorff Distance to Hellinger Distance Within a Slab. 
Lemma 20 For each Mi, M 2 G M n , H(M\ n Rj,M 2 n Rj) < C /i 2 (Qji, Q j2 ). 



Proof Let g\ and g 2 be defined as in Lemma 16 There exists x G 2j such that g\{x) G 
M\, g 2 (x) G M 2 and ||<?i(x) — 52 0*0 1 1 = 7- We claim there exists 2' C 2j such that 
ini x& y ||<7i(z0 — 52 0*0 1 1 > 7/2 and such that V(2') > cb~ d n . This follows since g\ and g 2 are 
smooth, they both lie in a slab of size a n around 1j and the angle between the tangent of 
Qj(x) and 2j is bounded by 7r/4. 

Create a modified manifold M 2 such that M 2 differs from M\ over 1' by a 7/2 shift 
orthogonal to 2j and such that M 2 is otherwise equal to M\. It follows that £\(Mi, M 2 ) > 
ei(Mx,M^) and h(Q 1 ,Q 2 ) > h{Q x ,Q' 2 ). 

Every point in the support of the conditioned distributions can be written as an ordered 
pair (x, y) where x G 2j and y lies in a d! ball of radius a. M 2 is shifted a distance of 7/2 in 
the direction orthogonal to 1j. As a result, the ^1 distance between M\ and M2 equals the 
integral over C of the volume difference between two d! balls of the same radius that are 
shifted by 7/2 relative to each other. This volume 5 d j. Hence, V{MiC\1j)o (M 2 n2j) > j6 d . 
Let A = {x G 2j : q± > 0, g 2 = 0}, B = {x G 1j : q\ > 0, q 2 > 0}, C = {x G 1,- : ft = 
0,(72 > 0}. At least one of A or U has volume at least -yS^/2. Without loss of generality, 
assume that it is A. Then 



h 2 (quq2) = /(v^T-v^) 2 > f(Voi-V^) 2 = I 

J J A J A 



<7i 
I A 



^ - J ^r 1 = cC* 1 = cC*H(M 1 ,M 2 ). 



J n 



Step 4d. The Hausdorff Rate. 

Lemma 21 For any A > there exists Co such that 
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Proof This follows by combining Lemma 20 and Lemma 19 



Step 5: Final Estimator 



iJV 



Now we can combine the estimators from the difference slabs. Let M = Ui=i Mj. Recall 
that the number of slabs is N = (c5 n )~ d = {Cn/\ogn) d ^ 2+d \ 



Proof of Theorem [2J Choose an A > 2/(2 + d). We have: 



Q n <H(M,M) > 



Cq log n 



n 



2+d 



- ^2Q n (\H(M j ,MnR j ) > 



< 



.1 

N 



Co log n \ 2 + d 



■it 



IV 



n \ 2 + d 1 c 

C log nj n A ~ n A 



\ 2/ {2+d) 

Let r n ---- [^^) 

uniformly bounded above by a constant Kq. Hence, 



IJ "- '■"■ ! . Since M and M are contained in a compact set, H(M,M) is 



E Q H(M, M) = E Q [H(M, M)I(H(M, M) > r n )] + E Q [H(M, M)I(H(M, M) < r n )\ 
< K Q n (H(M,M)>r n ) + r n 



- ~A +rn = \ \ 



- 



5. A Simple, Consistent Estimator 

Here we give a practical, consistent estimator, one that does not converge at the optimal 



rate. It is a generalization of the estimator in Genovese et al. (|2010|) and is similar to the 
estimator in 



Niyogiet al. (2006). Let 



S={jB D (Yi,e) 



(30) 



i=i 



and define dS = d(S), a = max ve gd(y, dS) and 

M=LgS: d{y,dS)>d-2e\. 



(31) 
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Lemma 22 Let e n = Cilogn/n) 1 ' in the estimator M . Then 

logn x 



H(M, M) = O 



n 



(32) 



almost surely for all large n. 



Before proving the lemma we need a few definitions. Following Cuevas and Rodriguez 



Casal (2004), we say that a set S is (x, X) -standard if there exist positive numbers x an d A 



such that 



VD(B D (y, e)nS)> X vd{B{ V , e)) for all y € S, < e < A. 



(33) 



We say that S is partly expandable if there exist r > and R > 1 such that H(dS, d(S(Be)) < 
Re for all < e < r. A standard set has no sharp peaks while a partly expandable set has 
not deep inlets. 

Lemma 23 If a < A(M) then S = M © a is standard with x = 2 _D and X = a and partly 
expandable with r = A(M) — a and R = 1. 

Proof Let x = 2 - - . Let y be a point in S and let A(y) < <r be its distance from 
the boundary 9S". If A(y) > e then Bu(y,e) d S = B£>(y,e) so that V£,{Brj{y-,e) 5) = 
VD{B D (y,e)) > xvD{B D {y,e))- 

Suppose that A(y) < e. Let u be a point on the manifold closest to y and let y* be the 
point on the segment joining y to v such that \\y — y*|| = e/2. The ball ^4 = Bjj(y*,e/2) 
is contained in both B£>(y,e) and 5. Hence, VD{B£>{y,e) n5) > ^d(A) > x u d{Bd{v ■,£))■ 
This is true for all e < <r, hence 5" is (x, A)-standard for x = 1/2 D and A = cr. 



Now we show that S 1 is partly expandable. By Proposition 1 in Cuevas and Rodrfguez- 



Casal (2004) it suffices to show that a ball of radius r rolls freely outside S for some r, 



meaning that, for each y £ dS, there is an a such that y G B(a,r) C S c , where S c is the 
complement of S. Let O y be the ball of radius A — a tangent to y such that O y C S c . Such 
a ball exists by virtue of the fact that a < A(M). ■ 



Theorem 24 (Cuevas and Rodriguez-Casal (2004)) LetYi, . . . ,Y n be a random sam- 
ple from a distribution with support S. Let S be compact, (A, x) -standard and partly ex- 
pandable. Let 



S=\jB(Yi,e r 



(34) 



i=i 



and let dS be the boundary of S. Let e n = Cilogn/n) 1 ' with C > (2/(x Wd)) ' where 
W£i = V(Bd(0, 1)). Then, with probability one, 



H(S,S)<C[ l °^) 1/D and H(dS,dS) < C ( ^V^ 



n 



n 



(35) 



for all large n. Also, S <Z S almost surely for all large n. 
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Proof of Lemma 22 , Theorem 24 and Lemma 23 imply that H(S, S) < C(logn/n) l ' D 
and H(dS,dS) < C^ogn/n) 1 / . It follows that a > a - e. First we show that y £ M 
implies that d(y, M) < 4e. Let y G M. Then d(y, dS) > d(y, dS) - e > a - 2e - e > 
a — e — 2e — e = a — 4e. So d(y, M) = a — <i(y, 95) < a — a + 4e = 4e. Now we show that 
M C M. Suppose that y E M. Then, 

d(y, 95) > d(y, 95) - e = a - e > a - 2e 

so that y £ M. ■ 

6. Conclusion and Open Questions 

We have established that the optimal rate for estimating a smooth manifold in Hausdorff 

2_ 

distance is n 2 + d . We conclude with some comments and open questions. 



We have assumed that the noise is perpendicular to the manifold. In current work 
we are deriving the minimax rate under the more general assumption that e is drawn 
from a general, spherically symmetric distribution. We also allow the distribution 
along the manifold to be any smooth density bounded away from 0. The rates are 
quite different and the methods for proving the rates are substantially more involved. 
Moreover, the rates depends on the behavior of the noise density near the boundary 
of its support. We will report on this elsewhere. 

Perhaps the most important open question is to find a computationally tractable 
estimator that achieves the optimal rate. It is possible that combining the estimator 



in Section 5 with one of the estimators in the computational geometry literature (Dey 



(2006)) could work. However, it appears that some modification of such an estimator 



is needed. This is a difficult question which we hope to address in the future. 



3. It is interesting to note that Niyogi et al. (2006) have a Gaussian noise distribution. 



While it is possible to infer the homology of M with Gaussian noise it is not possible to 
infer M itself with any accuracy. The reason is that manifold estimation is similar to 
(and in fact, more difficult than) nonparametric regression with measurement error. 
In that case, it is well known that the fastest possible rates under Gaussian noise 
are logarithmic. This highlights an important distinction between estimating the 
topological structure of M versus estimating M in Hausdorff distance. 
4. The current results take A(M), d and a as known (or at least bounded by known 
constants). In practice these must be estimated. We do not know whether there exist 
minimax estimators that are adaptive over d, A(M) and a. 
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7. Appendix 



7.1 Proof of Equation 13 



We will use the following two results (see Section 2.4 of Tsybakov (2008)): 

,2/nn n«\ 



h z {P n ,Q r 



2 1 



1 



h 2 (P,Q) 



and 



We have 



PAg,i(i-«5ffl 



P » A0 . > i^^wvia^o) 



(36) 



(37) 



2/1 



> 1 ( 1 £ i( p >Q) 



2// 



since h 2 (P,Q) <h(P,Q). 

7.2 Proof of Theorem [6] 

We define two manifolds Mi and M2 with corresponding distributions Q\ and Q2 such that 
(i) A(iWj) > k i = 1,2, (ii) H(M l ,M 2 ) = 7 and (hi) such that the volume of Si o S 2 is of 
order 72+ , where S, is the support of Qi. 

We write a generic /^-dimensional vector as y = (u,v,z), with u G M d , t> G R, 2 G 
R D-d-i_ For each u e R d with || n || < i ; define the disk in R d+1 

D = {(u,0)£R d+1 : M £5 d (0,l)} 
and let 

F = dl \J B d+1 ((u,v),n) 

\(u,v)eD 

Now define the following d-dimensional manifold in H D 

M = |(u,u,023_d_i) : (u,v) GF } 

= {(u,a(u),0 D _ d _i): u£ B d (0,l + «)} u{(u,-a(«), D _ d _i) : u G B d (0, 1 + «)} 

where 

0(«) - ! 



if llitll < 1 



xA 2 " 



l) 2 if 1< llnll < 1 + K. 



The manifold Mo has no boundary and, by construction, A(Mq) > k. 

Now define a second manifold that coincides with Mq but has a small perturbation: 

Mi = {(n,6(u),0 D _ d _i) : u G B d (0, 1 + «)} U {(«, -a(u), D _ d _i) : u G 5 d (0, 1 + k)} 
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where 



b(u) 



, a(n) 



if \\u\\ < \y/^l K ~ 7 2 



u 



\/47 K 



7 



: 'i 2 if ^47^-72 < ||u|| < \/ 4 7 K - 7 2 



if \J 4jk — 7 2 < I |u| I < y 47K — 7 2 + k. 



Note that A (Mi) > k since the perturbation is obtained using portions of spheres of radius 
K. In fact 

• for ||u|| < 2 v 47K — 7 2 , b(u) is the d + 1-th coordinate of the "upper" portion of the 
(d + l)-dimensional sphere with radius k centered at (0, • • • , 0, 7), hence b(u) satisfies 

||u|| 2 + (b(u) -7) 2 = k 2 with b(u) > 7; 



for I \/^7 K ~~ 7 2 < IMI — yf^l K ~~ 7 2 > ^( n ) i s the (d + l)-th coordinate of the 
"lower" portion of the (d + l)-dimensional sphere with radius k centered at (u ■ 
y ' A.^k — 7 2 /||n||, 2k) (note that the center of the sphere differs according to the di- 
rection of u), hence b(u) satisfies 



u 



\u\ 



\fkyn 



r 



+ (b(u) - 2k) 2 = k 2 with b{u) < 2k. 



To summarize, Mq and Mi are both manifolds with no boundary, A(Mo) > k and 
A (Mi) > «. See Figure [5J Now 

£ = M - Mi = {(«, a(n), D _ d _i) : u € B d (0, ^Ajk - 7 2 )} 
E x = Mi - M = {(«, 6(«), D _ d _i) : u G B d (0, ^^k - 7 2 )}. 

Note that for each point y G Eq there exists y' S £1 such that \\y— y'\\ < \a(u)—b(u)\ < 7. 
Also, yo = (0, a(0), 0) E Mo has as its closest Mi point y\ = (0, 6(0), 0), so that | |yo — ^o| I = 
7. Hence H(M , Mi) = ff(£b, E{) = 7. 

To find an upper bound for V(SqoSi), we show that each y = (u, v, z) £ S\ — Sq satisfies 
the following conditions: 



(i) nGB d (0,V47«-7 2 ); 

(ii) zG5 D _ d _i(0,ci); 

(hi) k + a — \\z\\ < v < k + 7 + a — \\z\\]. 

If y = (u, v , z) belongs to S\ and has ||u|| > y/Aryn — 7 2 , then there is a point of Mo n Mi 
within distance a, hence y S\ — Sq. This proves (i). Before proving (ii) and (hi), note 
that if u E B d (0, \JA.^k - 7 2 ) then 

k = a(u) < b{u) < k + 7. 

Now, let y' = (u' , b(u'), 0) G i?i be the point in Si closest to y. We have 

d(y, Si) = \\y — y'\\ = \\u — u'\\ + \v — b(u')\ + \\z\\ < a. 
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Figure 5: One section of manifolds Mq and M\. The common part is dashed, Eq is dotted 
and E\ solid. R\ and R2 denote the regions where the different definitions of the 
perturbation apply: R\ is ||n|| < | \/^7 K ~~ 7 2 while R2 denotes ^ \/^7 K ~~ 7 2 < 
||ti|| < y/4jK — 7 2 . 



This gives condition (ii) above ||z|| < a and also 

|«-6(u')l <°"- INI- ( 38 ) 

Since b{v!) < k + 7, we obtain 

w < 6(n') + <7 — ||z|| < K + 7 + cr— ||z|| 
which is the right inequality in (hi). Finally, 

a <d(y,M ) < \\y - (u,a(u), 0)|| = |u-o(«)| + ||z|| 

which implies either v < a(u) — (a — \\z\\) or v > a(u) + {a — \\z\\). The former inequality 
would imply 

v < a(u) — (a — \\z\\) = k, — (a — \\z\\) < inf b(u') — (a — \\z\\) 



so that \v — b(u')\ > a — \\z\\ for all u' , which is in contadiction with (38). Hence we have 



v > a(u) + (a — \\z\\) = k + (a — \\z\\) that is the left inequality in (hi). 
As a consequence, 

Si-S CB d (0,^4-/K-j 2 )x{(v,z) £R D - d : K -j+a-\\z\\ < v < K+j+a-\\z\\],z € B D ^. 1 {Q,a)\ 

and 

V(S -S X )<C- (V4 7 « - l 2 ) d ■ 7 • o D - d -\ 

Hence, V{S - Si) = 0( 7 i +1 ). 

With similar arguments one can show that V(S\ — Sq) = 0(^ +1 ) so that 

V{S Q oS 1 ) = 0{ 1 i +1 ). 
It then follows that / |g - 9i| = 0(7 (d+2)/2 ). 
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